Isochoric Nucleation Detection analysis | stage 1: Import & Processing¶

As part of the PhD work of Bruno M. Guerreiro © 2024. If using this notebook, please cite the paper: https://doi.org/10.1021/acsbiomaterials.2c00075

Disclaimer: due to the changing nature of programming documentation, lab work developed and tacit knowledge in this notebook, please contact the author at bruno.guerreiro@fulbrightmail.org if something is not working properly. The code is not actively maintained.


How it works:¶

  1. import .csv of intended data
  2. change export var name at the end and write_image names
  3. tinker with debugging thresholds until satisfied - mainly standard deviation
  4. run all
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import math

$$Data$$¶

Import data here¶

In [2]:
### Import data

# new file (change this to assess one sample)
data = pd.read_csv("data/raw/water_bruno.csv")

# control
#water = pd.read_csv("data/INDE_cycles/water_tony.csv")
#%store water

Numerical transforms¶

In [3]:
### Convert time data from seconds to minutes
data['Time'] = data['Time']/60

### Convert strain data to only show x10E-4, then input on axis Strain * 10-4
data['Strain'] = np.multiply(data['Strain'], np.power(10, 4))

Plot graph¶

In [4]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=data["Time"], y=data["Strain"], name="Strain", line_color="black", opacity=0.85),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=data["Time"], y=data["T"], name="T (ºC)", opacity=0.5, line_color="blue"),
    secondary_y=True,
)

# Add figure title
#fig.update_layout(title_text="INDe nucleation cycles")

# Customizable templates, just change the 'template' arg
for template in ["plotly", "plotly_white", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"]:
    fig.update_layout(template="simple_white", showlegend=False)

# Set x-axis title
fig.update_xaxes(title_text="<b>Time</b> (min)")

# Set y-axes titles
fig.update_yaxes(title_text="<b>Strain</b> (V/V*1e4)", secondary_y=False)
fig.update_yaxes(title_text="<b>Temperature</b> (ºC)", secondary_y=True)

fig.show()

$$Mining$$¶

Calculate local maxima¶

Adapted from https://eddwardo.github.io/posts/2019-06-05-finding-local-extreams-in-pandas-time-series/

In [5]:
from scipy.signal import argrelextrema
import numpy as np

# Needed because code recognizes 'T' as Transpose
data['Temperature'] = data['T']*-1

threshold = 1

# order: How many points on each side to use for the comparison to consider
ilocs_max = argrelextrema(data.Temperature.values, np.greater_equal, order=300)[0]

# create new plot variables
Nucleation_T = data.iloc[ilocs_max].Temperature * -1
Strain_at_NT = data.iloc[ilocs_max].Strain
cycle_number = np.arange(1, len(Nucleation_T)+1)


# filter prices that are peaks and plot them differently to be visible on the plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=data.iloc[ilocs_max].Time, y=data.iloc[ilocs_max].Temperature, mode='lines', line_color='rgba(0,0,0,1)', line_width=1))
fig.add_trace(go.Scatter(x=data.iloc[ilocs_max].Time, y=data.iloc[ilocs_max].Temperature, text=cycle_number, marker_color='red', marker_size=5, marker_symbol=2,
                        mode='markers+text', textposition='top center', textfont=dict(color='black', size=7)
                        ))

fig.update_layout(template='simple_white', showlegend=False,
                 xaxis_title='Time (min)', yaxis_title='Nucleation temperature (ºC)')
fig.show()

Get $T_{nucleation}$ data for each cycle¶

In [6]:
df = pd.DataFrame()
df.set_index = ilocs_max
df['Cycle'] = list(cycle_number)
df['Tnuc'] = list(Nucleation_T)
df['Strain'] = list(Strain_at_NT)
df
Out[6]:
Cycle Tnuc Strain
0 1 -5.1498 0.25504
1 2 -6.5347 0.79206
2 3 -7.0608 1.01039
3 4 -10.7624 1.84390
4 5 -10.8480 0.00494
... ... ... ...
438 439 -11.9476 -4.86293
439 440 -11.0938 -5.03369
440 441 -11.1815 -4.69809
441 442 -10.7032 -5.06373
442 443 -10.3282 -4.67842

443 rows × 3 columns

In [7]:
print('Number of cycles: ', df.index.stop,
     "\nAverage nucleation temperature: ", round(Nucleation_T.mean(), 2), "ºC", "+-", round(Nucleation_T.std(), 2), "ºC")
Number of cycles:  443 
Average nucleation temperature:  -0.54 ºC +- 15.83 ºC

Remove bugged cycles with no scientific significance¶

In [8]:
mean = np.mean(Nucleation_T)
standard_deviation = np.std(Nucleation_T)
distance_from_mean = abs(Nucleation_T - mean)

# adjust hyperparameter as needed
max_deviations = 1

# if the datapoint deviates too much from the mean, it gets removed
not_outlier = distance_from_mean < max_deviations * standard_deviation
no_outliers = Nucleation_T[not_outlier]


print(no_outliers)
print(len(no_outliers))
838       -5.1498
1689      -6.5347
2558      -7.0608
3595     -10.7624
4669     -10.8480
           ...   
442861   -11.9476
444105   -11.0938
445347   -11.1815
446565   -10.7032
447769   -10.3282
Name: Temperature, Length: 304, dtype: float64
304

Adjusted number of relevant cycles¶

In [9]:
cycle_number_adj = np.arange(1, len(no_outliers)+1)
In [10]:
df_adj = pd.DataFrame()
df_adj['Cycle'] = list(cycle_number_adj)
df_adj['Tnuc'] = list(no_outliers)
df_adj

water = df_adj[(df_adj['Tnuc'] < -9)&(df_adj['Tnuc'] > -6)]

# Save as csv file
water.to_csv (r'data\INDe_cycles\water_bruno.csv', index = False, header=True)
In [11]:
# filter prices that are peaks and plot them differently to be visible on the plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_adj['Cycle'], y=df_adj['Tnuc'], mode='lines', line_color='rgba(0,0,0,1)', line_width=1))
fig.add_trace(go.Scatter(x=df_adj['Cycle'], y=df_adj['Tnuc'], text=cycle_number, marker_color='red', marker_size=5, marker_symbol=2,
                        mode='markers+text', textposition='top center', textfont=dict(color='black', size=8)
                        ))
fig.update_layout(template='simple_white', showlegend=False, title='Corrected cycle list',
                 xaxis_title='Time (min)', yaxis_title='Nucleation temperature (ºC)')

fig.show()
In [12]:
df_adj['Tnuc'].describe()
Out[12]:
count    304.000000
mean     -11.200462
std        1.592527
min      -14.929100
25%      -12.063475
50%      -11.411750
75%      -10.685125
max        3.020200
Name: Tnuc, dtype: float64

Nucleation temperature distribution¶

In [13]:
import plotly.graph_objects as go

import pandas as pd

fig = go.Figure(data=go.Violin(y=df_adj['Tnuc'], box_visible=True, line_color='black',
                               meanline_visible=True, fillcolor='lightblue', opacity=0.6, points="all",
                               x0='Water', pointpos=0))

# Customizable templates, just change the 'template' arg
for template in ["plotly", "plotly_white", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"]:
    fig.update_layout(template="simple_white")

fig.update_layout(yaxis_zeroline=False, width = 600,height = 600, yaxis_title='Nucleation temperature (ºC)')
fig.show()

Nucleation cycles, survival graph, violin/box plots¶

In [14]:
### SURVIVAL GRAPH CALCULATIONS

# Obtain y-axis: Order nucleation temperatures from smallest to largest
ordered_Tnuc = sorted(df_adj['Tnuc'], reverse=False)

# Obtain x-axis: Divide index by length of array
chi = []
for i in range(0,len(ordered_Tnuc)):
    chi.append((i+1)/len(ordered_Tnuc))
$$\chi(T) = e^{\frac{-A}{\beta}\frac{\gamma(T-T_m)^{1+n}}{1+n}}$$
In [15]:
fig = make_subplots(rows=1, cols=3)

# Tnuc-cycle plot
fig.add_trace(go.Scatter(x=df_adj['Cycle'], y=df_adj['Tnuc'], mode='lines+markers', legendgroup=1), row=1,col=1)

# Survivor plot
fig.add_trace(go.Scatter(x=chi, y=ordered_Tnuc, mode='markers', marker_size=8, opacity=0.75, legendgroup=2), row=1,col=2)
#POISSON FIT HERE --- fig.add_trace(go.Scatter(x=chi, y=ordered_Tnuc, mode='lines', line_color='black', line_width=2, opacity=.35), row=1,col=3)

# Violin plot
fig.add_trace(go.Violin(y=water['Tnuc'], name='water', box_visible=True, points='all', meanline_visible=True, pointpos=0, marker_opacity=.35, opacity=.75, marker_size=5, legendgroup=3), row=1, col=3)
fig.add_trace(go.Violin(y=df_adj['Tnuc'], name='water', box_visible=True, points='all', meanline_visible=True, pointpos=0, marker_opacity=.35, marker_size=5, legendgroup=3), row=1, col=3)

# Box plot
#fig.add_trace(go.Box(y=water['Tnuc'], boxpoints='all', pointpos=0, marker_color='rgb(107,174,214)', marker_opacity=.35, marker_size=8, line_color='rgb(107,174,214)', name="without", legendgroup=4), row=1, col=4)
#fig.add_trace(go.Box(y=sample['Tnuc'], boxpoints='all', pointpos=0, marker_opacity=.35, marker_size=8, name="with", legendgroup=4), row=1, col=4)

fig.update_layout(template='simple_white', 
                  showlegend=False,
                  xaxis1_title = 'Cycles',
                  xaxis2_title = 'Unfrozen fraction (%)',
                  xaxis3_title = 'Concentration (wt%)',
                  #xaxis4_title = 'water',
                  yaxis1_title = 'Nucleation temperature (ºC)')

fig.show()
In [16]:
# Exports for next stage

#sample = df_adj
#%store sample